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IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 


APPLICANT: 


CLASS, ET AL. 


SERIAL NO.: 


to be assigned 


FILED: 


herewith 


TITLE: 


METHOD FOR VOICE RECOGNITION USING A GRAMMAR 


ART UNIT: 


not yet known 


EXAMINER: 


not yet known 


Assistant Commissioner for Patents 


Washington, D.C. 20231 


Sir: 


PRELIMINARY AMENDMENT 


Please amend the above-identified application before a first consideration on the 
merits as follows: 

IN THE DRAWINGS: 

Please replace Figs. 1-3 and Fig. 5 with the amended Figs. 1-3 and 5 submitted 
herewith. 

IN THE SPECIFICATION: 

On page 1, delete line 1. 

On page 1, before line 2, insert — Field of the Invention --. 

On page 1, before line 5, insert — Related Technology —. 

On page 1, line 15, after "Patent No. 195 01 599 CI", insert-, which is hereby 
incorporated by reference herein,—. 

On page 2, after line 5, insert — A combination of linguistic detection models 
with phrase grammars and N-gram detection models in one language model is described in a 
publication of Meteer et al.: "Statistical Language Modeling Combining N-Gram and 
Context-Free Grammars," Speech Processing, Minneapolis, April 27-30, 1993, Vol. 2, pp. II- 
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37 to 11-40, XP000427719, IEEE. 

A publication of Kenji Kita: "Incorporating LR Parsing into Sphinx," ICASSP 91, 
Speech Processing 1, Toronto, May 14-18, 1991, Vol. 1, pp. 269-272, XP000245219, IEEE, 
describes a speech detection method that begins with a context-free grammar. If the parser 
can find a result with the context-free grammar, the digram grammar is not used. If a 
syntactically correct result is not present, a changeover is made to the digram grammar.-- 

On page 2, before line 6, insert — Summary of the Invention —. 

On page 2, line 6, change "the object" to —an object--. 

On page 2, delete lines 9 and 10. 

On page 2, before line 11, insert —The present invention provides a method for 
recognizing speech from word sequences assembled from multiple words of a given 
vocabulary, in which a first recognition method and a second recognition method are 
provided. A first recognition method and a second recognition method are applied to separate 
segments of a word sequence that is to be recognized. A recognition method with integrated 
unique syntax is applied as the first method and a recognition method with statistical word 
sequence evaluation is applied as the second recognition method. Upon a change fi*om the 
digram recognition method with integrated unique syntax to the second recognition method 
with statistical word sequence evaluation, the last two words of the segment processed using 
the first method are combined into one pseudoword that is processed using a digram detection 
method.—. 

On page 2, line 15, delete "What is". 

On page 2, line 1 6, change "essential about the combination is thaf to —According to 
the present invention,-. 

On page 4, before line 8, insert — Brief Description of the Drawings —. 
On page 4, line 9, delete "to preferred exemplary embodiments referring". 
On page 4, line 17, after "example", insert —based—. 
On page 4, before line 19, insert — Detailed Description —. 
On page 4, line 20, change "Figures" to —drawings—. 

IN THE ABSTRACT: 

On line 1 , change "The invention relates to a" to —A—, and change "bigram" to — 
digram—. 
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IN THE CLAIMS: 

Please cancel without prejudice original claims 1-10, the substitute claims 1-6 
annexed to the International Preliminary Examination Report, and add new claims 1 1-20 as 
follows: 

1 1 . (new) A method for recognizing speech from a word sequence, the method comprising: 

applying a first recognition procedure to a first segment of the word sequence, the first 
segment including a plurality of first words; 

applying a second recognition procedure to a second segment of the word sequence, 
the second segment including a plurality of second words; 

combining a last two words of the plurality of first words into a pseudoword upon a 
change from the first recognition procedure to the second recognition procedure; and 

processing the pseudoword using a digram detection method. 

12. (new) The method as recited in claim 1 1 wherein the first recognition procedure includes 
an integrated unique syntax procedure and the second recognition procedure includes a 
statistical word sequence procedure. 

13. (new) The method as recited in claim 12 wherein the first recognition procedure is a 
digram recognition procedure and the second recognition procedure is a trigram recognition 
procedure and wherein the second recognition procedure limits permissible series of second 
words in the second segment according to a statistical evaluation. 

14. (new) The method as recited in claim 12 wherein at least one of the first and second 
segments is predefined in terms of at least one of a respective segment length and segment 
position. 

15. (new) The method as recited in claim 14 wherein at least one of the first and second 
segments is permanently allocated to one of the first and the second recognition procedure. 

16. (new) The method as recited in claim 15 wherein the first segment has a predefined 
length and is positioned at a beginning of the word sequence. 
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17. (new) The method as recited in claim 12 wherein the second segment has a predefined 
length and is positioned at a beginning of the word sequence. 

18. (new) The method as recited in claim 13 wherein the applying the second recognition 
procedure includes: 

recognizing a word triplet, the word triplet including three second words of the 
plurality of second words; and 

representing the word triplet as a pseudoword doublet, the pseudoword doublet 
including a second and a third pseudoword, the second pseudoword overlapping with the 
third pseudoword and each of the second and third pseudo words including two of the three 
second words of the word triplet. 

19. (new) The method as recited in claim 12 wherein a change from the second recognition 
procedure to the first recognition procedure is performed based on a respective word 
detection or phrase detection. 

20. (new) The method as recited in claim 19 wherein the second recognition procedure is 
used as standard. 

REMARKS 

This Preliminary Amendment cancels without prejudice original claims 1-10 and the 
substitute claims 1-6 annexed to the International Preliminary Examination Report in the 
underlying PCT Application No. PCT/DE98/03536 (a translation of which is submitted 
herewith), and adds new claims 1 1-20. The new claims do not add new matter to the 
application but do conform the claims to U.S. Patent and Trademark Office rules. 

The amendments to the specification, drawings, and abstract are also to conform the 
specification and abstract to U.S. Patent and Trademark Office rules. It is respectfully 
submitted that the amendments to the specification, drawings, and abstract also do not 
introduce new matter into the application. 

The underlying PCT application includes a Search Report, a copy of which is also 
submitted herewith. 
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Conclusion 

Consideration of the present application as amended is hereby respectfully requested. 


Respectfully Submitted, 
Kenyon & Kenyon 




L 


Richard L. Mayer (Reg. No. 22,490) 

One Broadway 

New York, NY 10004 

(212) 425-7200 (tel.) 

(212) 425-5288 (fax) 
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METHOD FOR RECOGNIZING SPEECH USING A GRAMMAR 
Description 

The present invention relates to a method for recognizing speech from word 
sequences assembled from multiple words of a given vocabulary. 

5 The error rate for recognition of continuously spoken speech that permits any 

desired combination of all words rises considerably by comparison with individual 
word recognition. To counteract this, knowledge about permissible word sequences is 
stored in so-called language models, and used during recognition in order to reduce 
the number of word sequences. 

3 Language models are usually defined as so-called N-gram models, N 

designating the depth of the model; in other words, N successive words within a word 
sequence are taken into account during the current evaluation. Because the 
complexity of the recognition process rapidly rises with increasing values of N, 
digram (N = 2) and trigram (N = 3) language models are the ones principally used. 

) German Patent No. 195 01 599 CI describes, in addition to various previously 

known methods for speech recognition, a method that allows the storage in a digram 
language model of phrases having fixed syntax and any desired length N. The method 
integrates knowledge about the syntax of permitted phrases (word sequences) into the 
language model, and is therefore also referred to as a "syntactic digram." An essential 

) element for integrating syntax into the language model is the indexing of words that 

occur more than once in different phrase constellations. As a result, the speech 
recognition system is identical with and without integrated syntax. 

With the severe limitation of the permissible word sequences and a limited 
number of permitted phrases, the speech recognition system operating according to 

! the syntactic digram language model achieves a high recognition rate but is also 

usable only if syntactic limitations can be reliably defined and adhered to, for example 
in the case of short commands, date or time inputs, and the like. If the number of 
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permitted word sequences is large, however, complete definition of the syntax is very 
laborious; and in situations where spontaneously formulated word sequences also 
need to be recognized, and in which there is no guarantee that syntactic limitations 
will be observed, recognition using a strictly syntactic language model is of only 
5 limited suitability. 

It is therefore the object of the present invention to describe a method for 
recognizing speech that offers an expanded area of application compared to existing 
methods, with a good recognition rate. 

The present invention is described in Claim 1 . The dependent claims contain 
10 advantageous embodiments and developments of the present invention. 

The combined utilization of two different recognition methods, in particular 
having different degrees of syntactic limitation, preferably of recognition methods 
based on a language model with unique syntax on the one hand, and of a statistical N- 
gram language model on the other hand, results, surprisingly, in a considerably 
1 5 expanded area of application, yielding a variety of possible combinations. What is 

essential about the combination is that successive word sequence segments of a 
cohesive word sequence are processed using different recognition methods. 
Depending on the area of application, a different division of the overall word 
sequence into segments, and use of the various recognition methods, may be 
20 advantageous. In this context here and hereinafter, what is meant as "words" is not 

only words in the linguistic sense as sound sequences having a demonstrable 
conceptual content; "words" are rather to be understood in general as sound sequences 
processed integrally in the speech recognition system, for example including the 
speaking of individual letters, syllables, or syllable sequences without a specific 
25 conceptual assignment. 

When a word sequence is divided into one or more segments, it is possible in 
particular to predefine at least one segment in terms of position and/or length. A 
predefined segment of this kind can be positioned, in particular, at the beginning of a 
word sequence, and can also have a fixed length in terms of the number of words that 
30 it encompasses. Advantageously, the recognition method with the integrated unique 

syntax can then be allocated to this segment. Because of the limited length of the 
segment, the outlay in terms of syntax definition and processing using the recognition 
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method with integrated unique syntax remains within acceptable limits. At the same 
time, the number of plausible word sequences can be considerably limited because the 
syntax is defined and is taken into account in the first segment. One advantageous 
field of application of this is the input of concepts by spelling. For example it is 
5 possible to recognize several tens of thousands of different city names by spelled-out 

speech input, with a surprisingly high recognition rate and little outlay, by combining 
an initial segment of fixed length that is processed on the basis of a recognition 
method with integrated unique syntax, and further processing of the speech input 
following that segment using a statistical N-gram recognition method, in particularly a 

1 0 digram or trigram recognition method. If exclusively a recognition method with 

integrated unique syntax were used, the outlay for syntax integration and process 
would greatly exceed tolerable limits. On the other, the exclusive use of a statistical 
language model in such cases would yield inadequate recognition rates. 

Other advantageous examples of the segment-wise utilization of a recognition 

1 5 method with integrated unique syntax include word sequences with date or time 

information, whose word environment can then advantageously be processed with a 
statistical language model. 

It is particularly advantageous if a statistical language model is combined with 
a language model with integrated syntax limitation even for the recognition of word 

20 sequences in which recurrent characteristic terms or phrases can be expected. In this 

context, the statistical recognition method is preferably used as the standard 
procedure; and if the word flow is monitored in a manner known per se for specific 
terms or phrases ("word spotting" or "phrase spotting"), it is possible, when such 
terms or phrases are detected, to initiate a segment in which speech recognition is 

25 performed using the detection method with integrated unique syntax. This segment 

can possess a fixed or variable length, which in particular can also be adapted to the 
respective term or phrase. After the completion of this segment, if the word sequence 
continues, it is then possible to change back to the standard recognition method with 
statistical word sequence evaluation. 

30 For the recognition method with integrated unique syntax, it is preferable to 

use the syntactic digram recognition method known from the existing art cited 
initially. For the statistical speech recognition method with word sequence 
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evaluatioiij a digram recognition method is then also advantageous for application of 
an integral speech recognition system. On the other hand, a statistical recognition 
method with a higher value of N yields an improved detection rate, but also requires 
greater processing outlay. An advantageous compromise is to use a trigram 
5 recognition method for the statistical recognition method; a preferred embodiment of 

the present invention provides for performing recognition with the information 
volume of a trigram recognition method, in the form of digram processing. 

The present invention is illustrated in even further detail below with reference 
to preferred exemplary embodiments referring to the drawings, in which: 
10 Figure 1 shows a simple processing sequence diagram using the example 

of a spelled-out speech input; 

Figure 2 shows a network graph according to the existing art; 

Figure 3 shows the graph of Figure 2 with additional syntactic 
limitation; 

1 5 Figure 4 shows the beginning of the graph of Figure 3 utilizing the 

present invention; and 

Figure 5 shows an expanded example on the principle of Figure 4. 

The example selected for explanation of the present invention with reference 
20 to the Figures is spelled-out speech input of city names. The lexicon of a spelling 

recognition system to be used for this purpose comprises approximately 30 letters as 
well as a few additional words such as "double" or "dash." The list of city names 
contains, for example, several tens of thousands of entries, so that complete storage of 
the unique syntactic information (in this case the letter sequences) would increase the 
25 magnitude of the lexicon containing the syntactic information, and the computing 

time required for recognition, to unacceptable levels. 

The sequence diagram sketched in Figure 1 for the recognition of a spelled-out 
entry with no parameters of any kind indicates, by way of the arrows, that proceeding 
from a Start node, the word sequence (which, in the particular example selected, is a 
30 sequence of individually pronounced letter names) can begin with any one of the 

letters provided for, and any letter can be followed by any other letter imless the word 
sequence has already ended, as represented by the End node. 
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In the conventional network graph depiction, network paths are shown, for 
example, for the German city names Aachen, Aalen, and Amberg. As set forth in 
German Patent No. 195 01 599 CI aheady cited as existing art, in a network graph of 
this kind the identical word nodes (letters) occurring at various positions of the 
5 network yield not only the plausible word sequences provided for by the network 

paths, but also in a plurality of nonsense word sequences that nevertheless qualify as 
permissible according to the language model. 

To eliminate this problem, German Patent No. 195 01 599 CI proposes to use 
indexing in order to distinguish those word nodes which occur more than once in the 

1 0 network. Indexing makes all the word nodes of the network unique, and for each 

word node it is possible to indicate completely, as the syntax describing the totality of 
all permissible word sequences, the permissible subsequent word nodes. Especially in 
the case of spelled-out input of terms from a long list of terms, the ambiguity of the 
network graph without indexing is enormous. 

1 5 Based on the example of Figure 3, Figure 4 depicts the procedure according to 

the present invention. What is selected, for purposes of illustration, is a variant of the 
present invention in which at the beginning of the word sequence, a segment of 
constant predefined length is processed using a recognition method with unique 
syntax integration, and a changeover is then made to a statistical recognition method 

20 with word sequence evaluation. The basis for the recognition method with unique 

syntactic limitation is a syntactic digram recognition method. The length of the 
introductory segment at the beginning of the word sequence is assumed to be k = 3 
words. It is assumed for the subsequent segment of the word sequence, whose length 
is a priori not known or limited, that a statistical recognition method with word 

25 sequence evaluation, and with the information depth of a trigram method, will be 

used. In order to illustrate a particularly preferred embodiment of the present 
invention, a description will also be given of processing of the trigram information 
using a digram recognition method^ by the fact that the information volume of three 
words (word triplet) present inside the trigram window is divided into two 

30 overlapping pseudo words (word doublet) that each comprise a combination of two 

successive words of the underlying trigram window. 

In the example sketched in Figure 4, proceeding from the Start node, at the 
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beginning of a word sequence a syntactic digram recognition method is applied in a 
manner known from the existing art. For the city names entered in Figures 2 and 3 as 
network paths: 

AACHEN 
5 AALEN 

AMBERG, 


this means that the first three individually spoken letters 
AAC 

10 AAL 
AMB 

are processed with the syntactic digram recognition method. For processing of the 
subsequent word sequence segment using a trigram recognition method, it is 
advantageous if the information from the first segment can also already be evaluated 
15 as history for the beginning of the second segment. For processing with the 

information depth of a trigram, this means that the letter sequences 

ACHEN 

ALEN 

MBERG 

20 of the information should advantageously be available with trigram information depth. 

The processing in the second segment of the word sequence entered in spelled-out 
fashion therefore advantageously also includes the last two letters of the first segment. 

It is particularly advantageous if the same speech recognition system can be 
used in all successive segments. For this purpose, in the second segment the 
25 information present with trigram information depth is now processed using a digram 

recognition method. This is done by reshaping the word triplet of the trigram 
window, which is shifted stepwise sliding fashion along the word sequence, into a 
pseudoword doublet in which each two adjacent words of the word triplet of the 
trigram window are combined into one pseudoword. For the examples selected, the 
30 result is thus a sequence of pseudo words of the following type: 

AC CH HE EN 
AL LE EN 
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MB BE ER RG, 

in which each two successive pseudowords (letter pair) contain the speech 
information of a word triplet from one trigram window. Reshaping the word triplets 
into pseudoword doublets makes possible digram processing, which takes into 
5 account only two successive pseudowords in each case, while retaining the trigram 

information depth. Because digram processing is used in the second segment as well, 
the design of the speech recognition system remains the same over the entire word 
sequence. 

For the transition from the first segment with processing based on a syntactic 
10 digram recognition method to the second segment with processing based on the 

pseudoword digram recognition method without syntactic limitation, it is 
advantageous if, in the first segment, the last word node has added to it the 
information of the previous word node; this results, in the first segment, in a sequence 
of word nodes (letters) of the following kind: 
15 A A AC 

A A AL 
AM MB; 

the last word node once again constitutes a pseudoword with the information of the 
previous node. 

20 Figures 5 depicts a portion, configured using this principle, of the network 

graph for the examples also selected in Figures 2 and 3. Proceeding from a Start 
node, in the first segment the network is built up with individual word nodes 
(individual letters) which then, at the transition to the second segment, transition into 
pseudoword nodes each having the information volume of two successive letters. The 

25 transitions between the pseudoword nodes are evaluated, in a manner known per se, 

on the basis of learning samples. The resulting network graph comprises a 
combination of the two different recognition methods. Despite the considerably 
greater number of distinguishable pseudowords as compared to the number of 
different letters, dispensing with continuous application of a syntactic limitation over 

30 the entire network results in a considerable reduction in processing outlay, with a high 

recognition rate. 

In the example of Figure 5, arrows from each of the pseudoword nodes to the 
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End node indicate that even after only a portion of the entire word sequence, the 
speech input may already be sufficient for allocation of a term from the predefined 
list. In a recognition system, this can be implemented by the fact that once the 
number of terms considered relevant after input of a portion of the word sequence has 
5 been sufficiently limited, the recognition system offers a selection of terms (on a 

display, for example) so that input can thereby be shortened. 

The present invention is not limited to the exemplary embodiments described, 
but rather can be modified in various ways in the context of the capabilities of one 
skilled in the art. In particular, the degree to which syntactic information is taken into 
10 account in the second method is variable. 
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Claims 


1 . A method for recognizing speech from word sequences assembled from 
multiple words of a given vocabulary, in which a first recognition method and a 
second recognition method are provided for application to separate segments of a 
word sequence that is to be recognized. 

2. The method as defined in Claim 1, characterized in that the first recognition 
method is a recognition method with integrated unique syntax. 

3. The method as defined in one of Claims 1 through 4 [sic], characterized in that 
the first method is a digram recognition method with integrated unique syntax. 

4. The method as defined in one of Claims 1 through 3, characterized in that the 
second recognition method is a recognition method with statistical word sequence 
evaluation. 

5. The method as defined in Claim 4, 

characterized in that the second method is a trigram recognition method in 
which the permissible word sequences are limited by way of a purely 
statistical evaluation. 

6. The method as defined in Claim 5, characterized in that the word triplet of the 
trigram window is represented as a pseudoword doublet, the two pseudowords of a 
doublet overlapping and each containing two words of the corresponding triplet. 

7. The method as defined in Claim 6, 

characterized in that upon a change from the first recognition method with 
integrated unique syntax to the second recognition method with statistical 
word sequence evaluation, the last two words of the segment processed using 
the first method are combined into one pseudoword. 
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8. The method as defined in one of Claims 1 through 7, characterized in that at 
least one segment is predefined in terms of its position and/or its length, and is 
permanently allocated to one of the alternative recognition methods. 

9. The method as defined in Claim 8, 

characterized in that a segment of predefined length at the beginning of the 
phrase is processed using the first recognition method with integrated syntax. 

1 0. The method as defined in one of Claims 1 through 8, characterized in that the 
second recognition method without integrated syntax is utilized as standard, and a 
changeover to the first recognition method with integrated syntax is made on the basis 
of word detection (word spotting) or phrase detection (phrase spotting). 


FF01 23022 v 1 


10 


Abstract 


The invention relates to a method for voice recognition, wherein a bigram 
method with integrated unequivocal syntax restriction is combined with an N-gram 
voice model with statistical word sequence evaluation in such a way that alternative 
recognition methods can be used in different segments of a word sequence. 
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which a patent is sought on the invention entitled METHOD FOR VOICE RECOGNITION 
USING A GRAMMAR, the specification of which was filed as International Application No. 
PCT/DE98/03536 on 2 December 1998 
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application in accordance with Title 37, Code of Federal Regulations, § 1.56(a). 
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(212) 425-7200 (phone) 
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made on information and belief are believed to be true; and further that these statements were made 
with the knowledge that willful false statements and the like so made are punishable by fine or 
imprisonment, or both, under § 1 00 1 of Title 1 8 of the United States Code and that such willful 
statements may jeopardize the validity of the application or any patent issuing thereon. 
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